MPI Reduction Operations for Sparse Floating-point Data

نویسندگان

  • Michael Hofmann
  • Gudula Rünger
چکیده

This paper presents a pipeline algorithm for MPI Reduce that uses a Run Length Encoding (RLE) scheme to improve the global reduction of sparse floating-point data. The RLE scheme is directly incorporated into the reduction process and causes only low overheads in the worst case. The high throughput of the RLE scheme allows performance improvements when using high performance interconnects, too. Random sample data and sparse vector data from a parallel FEM application is used to demonstrate the performance of the new reduction algorithm for an HPC Cluster with InfiniBand interconnects.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Matrix-Vector Multiplication on FPGAs

Floating-point Sparse Matrix-Vector Multiplication (SpMXV) is a key computational kernel in scientic and engineering applications. The poor data locality of sparse matrices signicantly reduces the performance of SpMXV on general-purpose processors, which rely heavily on the cache hierarchy to achieve high performance. The abundant hardware resources on current FPGAs provide new opportunities to...

متن کامل

An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR)

LSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography. This paper presents a parallel MPI-CUDA implementation for LSQR solver. On CUDA level, our contributions include: (1) utilize CUBLAS and CUSPARSE to compute major steps in LSQR; (2) optimize memory copy between host memory and device memory; (3) develop a ...

متن کامل

The Algorithms for FPGA Implementation of Sparse Matrices Multiplication

In comparison to dense matrices multiplication, sparse matrices multiplication real performance for CPU is roughly 5–100 times lower when expressed in GFLOPs. For sparse matrices, microprocessors spend most of the time on comparing matrices indices rather than performing floating-point multiply and add operations. For 16-bit integer operations, like indices comparisons, computational power of t...

متن کامل

Collapsing floating-point operations

This paper addresses the issue of collapsing dependent floatingpoint operations. The presentation focuses on studying the dataflow graph of benchmark involving a large number of floating-point instructions. In particular, it focuses on the relevance of new floating-point operators performing two dependent operations which are similar to “fused multiply and add”. Finally, this paper examines the...

متن کامل

Sparse Non-blocking Collectives in Quantum Mechanical Calculations

For generality, MPI collective operations support arbitrary dense communication patterns. However, in many applications where collective operations would be beneficial, only sparse communication patterns are required. This paper presents one such application: Octopus, a production-quality quantum mechanical simulation. We introduce new sparse collective operations defined on graph communicators...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008